Rotational Properties of Vocal Tract Length Difference in Cepstral Space

نویسندگان

  • Daisuke Saito
  • Nobuaki Minematsu
  • Keikichi Hirose
چکیده

In this paper, we prove that the direction of cepstrum vectors strongly depends on vocal tract length and that this dependency is represented as rotation in a cepstrum space. In speech recognition studies, vocal tract length normalization (VTLN) techniques are widely used to cancel ageand gender-difference. In VTLN, a frequency warping is often carried out and it can be modeled as a linear transform in a cepstrum space; ĉ=Ac. In this study, the geometric properties of this transformation matrix A are made clear using n dimensional geometry and it is shown that the matrix can be approximated as rotation matrix. Further, for better approximation, a new method is proposed. Namely, using eigenvalues of A, its quasi-rotational distortion is factorized into multiple true rotation operations and multiple magnification operations. This decomposition resolves the intrinsic ambiguity of the rotation angle based on the inner product, and it describes the detailed geometrical properties of the transformation caused by vocal tract length normalization. Experimental results using real and resynthesized speech samples demonstrate that the difference of cepstrum vectors extracted from different speakers is represented as rotation and magnification, and that the decomposition based on eigenvalues can capture it precisely.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature space normalization in adverse acoustic conditions

We study the effect of different feature space normalization techniques in adverse acoustic conditions. Recognition tests are reported for cepstral mean and variance normalization, histogram normalization, feature space rotation, and vocal tract length normalization on a German isolated word recognition task with large acoustic mismatch. The training data was recorded in clean office environmen...

متن کامل

Integrating Complementary Features from Vocal Source and Vocal Tract for Speaker Identification

This paper describes a speaker identification system that uses complementary acoustic features derived from the vocal source excitation and the vocal tract system. Conventional speaker recognition systems typically adopt the cepstral coefficients, e.g., Mel-frequency cepstral coefficients (MFCC) and linear predictive cepstral coefficients (LPCC), as the representative features. The cepstral fea...

متن کامل

New transformations of cepstral parameters for automatic vocal tract length normalization in speech recognition

This paper proposes a method to transform acoustic models (HMM gaussian mixtures) that have been trained on a certain group of speakers for use on speech from a different group of speakers. Cepstral features are transformed on the basis of assumptions regarding the difference in vocal tract length (VTL) between the groups of speakers (VTL normalisation, VTLN). Firstly, the VTL of these groups h...

متن کامل

Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation

We present a method that determines the optimal configuration of a bilinear vocal tract length normalization function to transform the frequency axis of one voice according to a specific target voice. Given a number of parallel utterances of the involved speakers, the single parameter of this function can be calculated through an iterative procedure by minimizing an objective error measure defi...

متن کامل

Cepstral and linear prediction techniques for improving intelligibility and audibility of impaired speech

Human speech becomes impaired i.e., unintelligible due to a variety of reasons that can be either neurological or anatomical. The objective of the research was to improve the intelligibility and audibility of the impaired speech that resulted from a disabled human speech mechanism with impairment in the acoustic system-the supra-laryngeal vocal tract. For this purpose three methods are presente...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011